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DETAILED ACTION 
Response to Amendment 

1 . Applicant's arguments filed 1/25/2006 have been fully considered but they are 
not persuasive. The term "quantities characterizing elements of the graph" does not 
expressively indicates that "sequences of segments from the source utterances can be 
selected based on the unit labels of those segments and transition costs that are based 
on the unit labels" (page 4, lines 1-5 of the specification) rather than based on 
characteristics of the segments. Therefore, examiner treats the step of "determining a 
numerical score that characterizes a quality of a concatenation of the sequence of 
segments based on quantities characterizing elements of the graph"(c\aim$ 1 and 1 1 ) 
as computation of concatenation cost based on characteristics of the segments. 

2. Applicant's arguments with respect to claims 1-20 have been considered but are 
moot in view of the new ground(s) of rejection necessitated by claim amendment and 
introduction of new claims 19-20, 

Claim Objections 

3. Claims 12-13 are objected to because of the following informalities: there is a 
lack of antecedent basis. Claims 12-13 should not depend on claim 10, but rather 
depend on claim 1 1 . Examiner treated claims 12-13 being dependent upon claim 1 1 . 
Appropriate correction is required. 
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Claim Rejections - 35 USC § 102 

4. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - (b) the invention was patented or described in a printed 
publication in this or a foreign country or in public use or on sale in this country, more than one year prior to 
the date of application for patent in the United States. 

5. Claims 1-10 and 18 are rejected under 35 U.S.C. 102(b) as being anticipated by 
Hunt et al. (IEEE Publication). 

6. Regarding claims 1 and 18, Hunt et al. disclose a method and a software stored 
on a computer-readable medium for selecting segments from a corpus of source 
utterances for synthesizing a target utterance, comprising: searching a graph in which 
each path through the graph identifies a sequence of segments of the corpus of source 
utterances and a corresponding sequence of unit and transition labels that 
characterizes a pronunciation of a concatenation of that sequence of segments, each 
path determining a numerical score that characterizes a quality of a concatenation of 
the sequence of segments based on quantities characterizing elements of the graph 
{sections 2. 1-2.2 on pages 374-375)] wherein searching the graph includes matching a 
pronunciation of the target utterance represented by unit labels and transition labels to 
paths through the graph, and selecting segments for synthesizing the target utterance 
based on the numerical scores of matching paths through the graph (sections 2.1-2.2 
on pages 374-375, Viterbi search algorithm propagates through the graph and picks the 
best paths). 
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7. Regarding claims 2-3 and 5, Hunt et al. further disclose the method of claim 1 
wherein selecting segments for synthesizing the target utterance includes identifying a 
path through the graph that matches the pronunciation of the target utterance and 
selecting the sequence of segments that is identified by the determined path (sections 
2.1-2.2 on pages 374-375, one best path is selected based on "concatenation cost'), 
wherein determining the path includes determining a best scoring path through the 
graph (sections 2. 1-2.2 on pages 374-375, one best path is selected based on 
"concatenation cost'), and concatenating the selected sequence of segments to form a 
waveform representation of the target utterance (sections 2.1-2.2 on pages 374-375). 

8. Regarding claims 6-8, Hunt et al. further disclose the method of claim 1 wherein 
selecting the segments for synthesizing the target utterance includes determining a 
plurality of paths through the graph that each matches the representation of the 
pronunciation of the target utterance (sections 2.1-2.2 on pages 374-375), wherein 
selecting the segments farther includes forming a plurality of sequences of segments, 
each associated with a different one of the plurality of paths (sections 2. 1-2.2 on pages 
374-375, inherent in Viterbi search algorithm), and wherein selecting the segments 
further includes selecting one of the sequences of segments based on characteristics of 
those sequences of segments not determined by the corresponding sequences of unit 
labels associated with those sequences (sections 2.1-2.2 on pages 374-375, one best 
sequence is selected based on the "concatenation cost'). 
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9. Regarding claims 9-10, Hunt et al. further disclose the method of claim 1 further 
comprising forming a representation of a plurality of pronunciations of the target 
utterance, and wherein searching the graph includes matching any of the 
pronunciations of the target utterance to paths through the graph (sections 2.1-2.2 on 
pages 374-375, "forced aligning'), and forming a representation of the pronunciation of 
the target utterance in terms of alternating unit labels and transitions labels (sections 
2.1-2.2 on pages 374-375, concatenation of units). 

Claim Rejections - 35 USC § 103 

10. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

1 1 . Claim 4 is rejected under 35 U.S.C. 103(a) as being unpatentable over Hunt et 
al. (IEEE Publication). 

12. Regarding claim 4, Hunt et al. disclose a method for selecting acoustic units in a 
concatenative speech synthesis system using Viterbi search algorithm, but fail to 
specifically disclose that the step of determining the best scoring path involves using a 
dynamic programming algorithm. However, examiner takes official notice that dynamic 
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programming is well known in the art. The advantage using dynamic programming is to 
improve execution speed. 

13. Claims 11-13 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Hunt et al. (IEEE Publication) in view of Beutnagel et al. (applicant's admitted prior art, 
incorporated by reference). 

14. Regarding claim 1 1 , Hunt et al. disclose a method for selecting segments from a 
corpus of sources utterances for synthesizing a target utterance, comprising: searching 
a graph in which each path through the graph identifies a sequence of segments of the 
corpus of source utterances and a corresponding sequence of unit labels that 
characterizes a pronunciation of a concatenation of that sequence of segments, each 
path a numerical score that characterizes a quality of a concatenation of the sequence 
of segments (sections 2.1-2.2 on pages 374-375)] wherein searching the graph includes 
matching a pronunciation of the target utterance to paths through the graph, and 
selecting segments for synthesizing the target utterance based on the numerical scores 
of matching paths through the graph (sections 2.1-2.2 on pages 374-375, Viterbi search 
algorithm propagates through the graph and picks the best paths). 

Hunt et al. fail to disclose wherein the graph includes a first part that encodes a 
sequence of segments and a corresponding sequence of unit labels for each of the 
source utterances, and a second part that encodes allowable transitions between 
segments of different source utterances and encodes a transition score for each of 
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those transitions; and matching the pronunciation of the target utterance to paths 
through the graph includes considering paths in which each transition between 
segments of different source utterances identified by that path corresponds to a different 
sub-path of that path that passes through the second part of the graph. 

However, Beutnagel et al. teach the graph including a first part that encodes a 
sequence of segments and a corresponding sequence of unit labels for each of the 
source utterances, and a second part that encodes allowable transitions between 
segments of different source utterances and encodes a transition score for each of 
those transitions (sections 4. 1-4.3, pre-computing and caching all the possible joint 
costs)] and matching the pronunciation of the target utterance to paths through the 
graph includes considering paths in which each transition between segments of different 
source utterances identified by that path corresponds to a different sub-path of that path 
that passes through the second part of the graph (sections 4.1-4.3, pre-computing and 
caching all the possible joint costs for use at runtime to reduce computing time). 

Since Hunt et al. and Beutnagel et al. are analogous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Hunt et al. by incorporating the teaching of 
Beutnagel et al. in order to reduce search time at runtime to improve system's speed. 

15. Regarding claims 12-13, Hunt et al. fail to specifically disclose the method of 
claim 1 1 , wherein selecting the segments for synthesis includes evaluating a score for 
each of the considered paths that is based on the transition scores associated with the 
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sub-paths through the second part of the graph, and wherein a size of the second part 
of the graph is substantially independent of a size of the source corpus, and a 
complexity of matching the pronunciation through the graph grows less than linearly 
with the size of the corpus. However, Beutnagel et al. teach the step of selecting the 
segments for synthesis includes evaluating a score for each of the considered paths 
that is based on the transition scores associated with the sub-paths through the second 
part of the graph (sections 4.1-4.3), and wherein a size of the second part of the graph 
is substantially independent of a size of the source corpus, and a complexity of 
matching the pronunciation through the graph grows less than linearly with the size of 
the corpus (sections 4. 1-4.3, pre-computed and cached possible joint costs, units are 
available for used by the speech synthesis system). 

Since Hunt et al. and Beutnagel et al. are analogous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Hunt et al. by incorporating the teaching of 
Beutnagel et al. in order to reduce search time at runtime to improve system's speed. 

16. Claims 14-16 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Beutnagel et al. (applicant's admitted prior art, incorporated by reference) in view of 
Hunt et al. (IEEE Publication). 

17. Regarding claim 14, Beutnagel et al. disclose a method comprising: providing the 
corpus of source utterances, each source utterance being segmented into a sequence 
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of segments, each consecutive pair of segments in a source utterance forming a 
segment boundary, and each speech segment being associated with a unit label and 
each segment boundary being associated with a transition label (sections 4.1-4.3, pre- 
computed and cached possible joint costs, units are available for used by the speech 
synthesis system)] and forming the graph, including forming a first part of the graph that 
encodes a sequence of segments and a corresponding sequence of unit labels and 
transition labels for each of the source utterances, and forming a second part that 
encodes allowable transitions between segments of different source utterances and 
encodes a transition score for each of those transitions (sections 4.1-4.3, pre-computed 
and cached possible joint costs, units are available for used by the speech synthesis 
system). Beutnagel et al. fail to specifically disclose the step of matching a 
pronunciation of a target utterance represented using unit and transition labels to one or 
more paths in the graph and identifying a sequence of segments for each of the paths. 
However, Hunt et al. teach the step of matching a pronunciation of a target utterance 
represented using unit and transition labels to one or more paths in the graph and 
identifying a sequence of segments for each of the paths (sections 2-2.2). 

Since Beutnagel et al. and Hunt et al. are analogous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Beutnagel et al. by incorporating the teaching of 
Hunt et al. in order to select the most appropriate units for synthesis. 
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18. Regarding claim 15, Beutnagel et al. further disclose the method of claim 14 
wherein forming the second part of the graph is performed independently of the 
utterances in the corpus of source utterances (cache). 

19. Regarding claim 16, Beutnagel et al. further disclose the method of claim 14 
further comprising: augmenting the corpus of source utterances with additional 
utterances (sections 4. 1-4.3, Viterbi algorithm searches the graph and picks best units 
and path)] and augmenting the graph including augmenting the first part of the graph to 
encode the additional utterances, and linking the augmented first part to the second part 
without modifying the second part based on the additional utterances (sections 4.1-4.3, 
pre-computing and caching all possible join costs for used by the speech synthesis 
system). 

20. Claims 19-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Beutnagel et al. (applicant's admitted prior art, incorporated by reference) in view of 
Hunt et al. (IEEE Publication), and further in view of Mohri et al. (US 6243679). 

21 . Regarding claims 19-20, the modified Beutnagel et al. fail to disclose that the 
graph is associated with a finite-state transducer which accepts input symbols that 
include unit labels and transition labels, and wherein matching the pronunciation of the 
target utterance to the paths in the graph includes composing a finite-state transducer 
representation of a pronunciation of the target utterance with the finite-state transducer 
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with which the graph is associated. However, Mohri et al. teach that the graph is 
associated with a finite-state transducer which accepts input symbols that include unit 
labels and transition labels (col. 10, In. 28 to col. 1 1, In. 67), wherein matching the 
pronunciation of the target utterance to the paths in the graph includes composing a 
finite-state transducer representation of a pronunciation of the target utterance with the 
finite-state transducer with which the graph is associated (col. 11, In. 31-67). 

Since the modified Beutnagel et al. and Mohri et al. are analogous ad because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time of invention to modify Beutnagel et al. by 
incorporating the teaching of Mohri et al. in order to achieve time and space 
minimization efficiencies (col. 1, In. 60 to col. 2, In. 2). 

22. Claim 17 is rejected under 35 U.S.C. 103(a) as being unpatentable over Hunt et 
al. (IEEE Publication) in view of Mohri et al. (US 6243679). 

23. Regarding claim 17, Hunt et al. do not disclose that the graph is associated with 
a finite-state transducer which accepts input symbols that include unit labels and 
transition labels, and that produces identifiers of segments of the source utterances, 
and wherein searching the graph is equivalent to composing a finite-state transducer 
representation of a pronunciation of the target utterance with the finite-state transducer 
with which the graph is associated. 
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However, Mohri et al. teach that the graph is associated with a finite-state 
transducer which accepts input symbols that include unit labels and transition labels, 
and that produces identifiers of segments of the source utterances (col. 10, In. 28 to col. 
1 1 , In. 67), and wherein searching the graph is equivalent to composing a finite-state 
transducer representation of a pronunciation of the target utterance with the finite-state 
transducer with which the graph is associated (col. 11, In. 31-67). 

Since Hunt et al. and Mohri et al. are analogous ad because they are from the 
same field of endeavors, it would have been obvious to one of ordinary skill in the art at 
the time of invention to modify Hunt et al. by incorporating the teaching of Mohri et al. in 
order to achieve time and space minimization efficiencies (col. 1, In. 60 to col. 2, In. 2). 

Conclusion 

Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 
§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
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the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Huyen X. Vo whose telephone number is 571-272-7631 . 
The examiner can normally be reached on M-F, 9-5:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 



Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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273-8300. 




